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Culinary systems, the practice of preparing a refined combination of ingredients that is palatable 
as well as socially acceptable, are examples of complex dynamical systems. They evolve over time 
and are affected by a large number of factors. Modeling the dynamic nature of evolution of regional 
cuisines may provide us a quantitative basis and exhibit underlying processes that have driven them 
into the present day status. This is especially important given that the potential culinary space is 
practically infinite because of possible number of ingredient combinations as recipes. Such studies 
also provide a means to compare and contrast cuisines and to unearth their therapeutic value. 

Herein we provide rigorous analysis of modeling eight diverse Indian regional cuisines, while also 
highlighting their uniqueness, and a comparison among those models at the level of flavor compounds 
which opens up molecular level studies associating them especially with non-communicable diseases 
such as diabetes. 

PACS numbers: 89.75.-k, 82.20.Wt, 87.18.Vf, 87.10.Vg, 89.90.+n 


I. INTRODUCTION 

Culinary systems are examples of complex dynamical 
systems. Culinary practices and hence food preparation 
procedures (recipes) have evolved to the present day tra¬ 
ditional cuisines by tuning them so as to suit human sen¬ 
sibilities. Knowing the complexity of culinary evolution 
the question is whether it could be modeled to identify 
key elements that drive its nature. 

Lately, culinary science has attracted the attention 
of physicists due to invariant patterns observed across 
cuisines as well as owing to cuisine-specific features that 
highlight evolutionary mechanisms whose understanding 
facilitate various applications m- Understanding the 
process of culinary evolution can help bring to the sur¬ 
face, the guiding principles behind the development of 
the cuisine. 

While the choice of food ingredients and their combina¬ 
tions is dictated by a range of factors such as geography, 
culture, climate and genetics [Ml, the sensory mecha¬ 
nisms of taste (gustatory) and smell (olfactory) play a 
dominant role to lock-in ingredients into recipes Uni- 

Various cuisines have been reported to have generic na¬ 
ture of recipe size distribution and frequency-rank distri¬ 
bution mm- Yet these cuisines are unique in preferring 
certain ingredients and ingredient combinations. India 
has had a long culinary history and is characterized by 
diverse geographies and climates. While the cuisine of 
India could be seen as a single cuisine, it is represented 
by a blend of regional cuisines. 

Earlier studies have reported most cuisines to have pos¬ 
itive food pairing i.e. they tend to use ingredient pairs of 
similar flavor and taste [2], Indian cuisine in contrast is 
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reported to have negative food pairing i.e. Indian recipes 
tend to have ingredients of complementary flavors [3]. 
This is an important distinction which, apart from re¬ 
flecting geoclimatic and cultural differences, says a lot 
about the trajectory that Indian cuisine has followed and 
perhaps bears specific culinary milestones in specifying 
recipe compositions. 

Herein we ask the following questions in the context of 
Indian regional cuisines, (a) Could a model generate a 
cuisine which is similar to the real world not only in terms 
of its statistical patterns but also in terms of its flavor 
profile? (b) What are the dominant factors in shaping 
the recipe size distribution? 

Towards addressing the above questions we created 
models of culinary evolution: (a) Copy-mutate model 
fitness random (CM-fitness random) (b) Copy-mutate 
model fitness ranked (CM-fitness ranked) (c) Copy- 
mutate-add-delete model (CMAD model). CM-random 
serves as a null model and when compared with CM- 
ranked, indicates the role of ingredient rank (frequency 
of use) in food pairing pattern. 

In the second section we describe the data acquisition 
and correction methods with corresponding final statis¬ 
tics. The third section explains our model’s methodology 
and parameters involved. It also explains the modality of 
authenticity study, carried out to find the most legitimate 
ingredients belonging to each regional cuisine. Then we 
provide results on reproduction of certain statistical fea¬ 
tures of the Indian cuisine (and its regional cuisines). 


II. DATA ACQUISITION AND STATISTICS 

We began with extracting data for Indian cuisine from 
the website tarladalal.com (November, 2014) [12] which 
is the largest online repository of recipes for Indian cui- 
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sine. After curating the data for redundant characters 
and words (such as contributor’s name) from recipes, we 
were left with 2543 recipes and their corresponding ingre¬ 
dients. The ingredients being listed in different spellings 
and usage amounts and forms (chopped, sliced etc.) were 
required to be aliased separately after which we had 
194 ingredients for the whole cuisine. Since the intent 
of current study was extended beyond simple statisti¬ 
cal modeling of cuisine to look for flavor patterns within 
the model cuisine, we also gathered information of flavor 
compounds present in the ingredients and ended up with 
an overall list of 1170 flavor compounds corresponding 
to above 194 ingredients through existing data 00 and 
resources of flavor compounds [13]. Table [i] lists statistics 
of recipes and ingredients in each of the regional cuisines. 


III. THE CULINARY EVOLUTION MODELS 

For the purpose of random copy-mutate model J], we 
assign a number selected uniformly randomly from the 
range [0, 1] to each available ingredient as its ‘fitness’ 
value. The meaning of this fitness value in the real world 
can be taken to be a quantifying parameter describing 
the possible preferential efficacy of an ingredient based 
on factors such as availability, nutritional aspect, relative 
popularity, flavor and cost [T] . 

The copy-mutate algorithm begins by creating a seed 
pool, Ro of 20 recipes generated by random selection of 
S = 7 ingredients for each such recipe from an initial ran¬ 
dom pool Io of 10 ingredients. Further at each time step 
we selected a recipe randomly from the pool as ‘mother’ 
recipe and made a copy of it for mutation. Within the 
copied recipe we chose an ingredient (of fitness /,;) ran¬ 
domly and compared its fitness value ft with the fitness 
value fj of another ingredient from the ingredient pool, 
also chosen randomly. If fj > fi we replace the old ingre¬ 
dient (i) with this new one (j). Thus the copied recipe is 
mutated 1 time. This process of mutation is carried out 
M number of times after which the mutated copy recipe 
is added back to the pool as another possible candidate 
of being a mother recipe in next time step. 

To introduce new ingredients we also check and main¬ 
tain at each time step a ratio r of size of ingredients pool 


TABLE I. Statistics of recipes and ingredients in regional 
cuisines. 


Cuisine Recipe count Ingredient count 


Bengali 

156 

102 

Gujarati 

392 

112 

Jain 

447 

138 

Maharashtrian 

130 

93 

Mughlai 

179 

105 

Punjabi 

1013 

152 

Rajasthani 

126 

78 

South Indian 

474 

114 


and size of recipe pool. The value of r for current study 
was taken to be this ratio, calculated from empirical data. 
If the ratio falls below the required threshold then new 
ingredients are introduced in the pool by random selec¬ 
tion from the overall available list of ingredients. 

The overall process of recipe selection-mutation is re¬ 
peated till we get R number of recipes which is equal to 
the empirical recipe count of 2,543. For normalization 
purposes, we create 24 such sets of random copy-mutate 
recipes and study overall statistics over average of all sets. 

We implemented three different models: 

• Copy-mutate Fitness Random 

In this model, the ‘fitness’ values are assigned to in¬ 
gredients on a uniform random basis. This model 
starts with no a priory basis or bias about the fit¬ 
ness of certain ingredients. 

• Copy-mutate Fitness Ranked 

In this model, an ingredient is assigned ‘fitness’ 
value based on its empirical frequency. Thus, an 
ingredient with higher frequency in real world cui¬ 
sine would have a higher fitness value. Obviously 
this model depends on fitness of an ingredient that 
is ascertained retrospectively. 

• Copy-mutate-add-delete Model 

Going further from previously models, where the 
size of the recipes in a cuisine is fixed, we gener¬ 
ated another model that has a provision for addi¬ 
tion and deletion of an ingredient. In this model, 
an additional factor was introduced to choose an 
ingredient for addition, deletion and mutation at 
each time-step. 

IV. INGREDIENT AUTHENTICITY 

Further, In order to understand and highlight the dif¬ 
ferences among regional cuisines and uniqueness of each 
one, we carried out a study on finding the most authentic 
ingredients. This study highlights ingredients more com¬ 
monly used in one cuisine as compared to other cuisines. 
In order to compute this, we use the prevalence 0 Pf 
of an ingredient i in a cuisine c as Pf = n\/N c where n c i 
is the number of recipes that contain the particular in¬ 
gredient i in the cuisine and N c is the number of recipes 
in the cuisine. The relative prevalence p c t measuring the 
authenticity of the ingredient % is computed as the dif¬ 
ference between the prevalence of i in cuisine c and the 
average prevalence of i in all other cuisines. 

V. RESULTS 

All eight regional cuisines reflect the culinary diversity 
of Indian culture. This is not only evident by the recipes 
belonging to each cuisines but also by the pattern of in¬ 
gredient usage. This could be observed by looking at the 
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FIG. 1. Frequency-rank distributions of ingredients for In¬ 
dian cuisine and corresponding copy-mutate model (‘Fitness 
random’ having randomly assigned fitness values for ingredi¬ 
ents) and its variant (‘Fitness ranked’) with frequency scaled 
fitness values for ingredients. The distribution in both cases 
match closely to that of the real world cuisine. The logged 
data was fitted with equation (f(x) = a* exp bx ) giving value 
of coefficient b as 0.5664 for Indian cuisine, 0.5873 for fitness 
random model and 0.4809 for the fitness ranked model. 


most authentic ingredients for regional cuisines. A list of 
top 5 most authentic ingredients for each of the regional 
cuisines is given in table [TlJ 


A. Frequency-rank distribution 

While the authenticity study highlights uniqueness of 
each regional cuisine, the generic nature of frequency- 
rank distribution has been shown to be a rather interest¬ 
ing statistical feature of cuisines around the world. Its 
consistent nature across regional Indian cuisines has been 
shown earlier [3j, making it a feature of special interest 
and an indicator of generic culinary evolution mechanism. 
We began our study by adopting the model for the pur¬ 
pose of reproducing this pattern. 

Fig. a shows the frequency-rank distributions for In¬ 
dian cuisine and corresponding copy-mutate models of 
both random fitness values and empirical frequency based 
fitness values. The figure indicates that the frequency- 
rank distribution pattern gets reproduced by both the 
models. This can further be emphasized by looking at 
the coefficient values for the exponential fitting of the 
curves, as listed in figure caption. 

The reproduction of frequency-rank could also be seen 
generically across all the eight regional cuisines in In¬ 
dia, as shown in Fig. [5] All the curves were fitted with 
equation f{x) = a*exp° x and corresponding b coefficient 
values are presented in table |III| 


B. Food pairing pattern 

The notion of food pairing is well-known in culinary 
science. The food pairing hypothesis, that two ingre¬ 
dients sharing common flavor compounds taste well to¬ 
gether, has been widely researched upon in previous 
studies Emu. Beginning from pairs of ingredients and 
corresponding number of shared flavor compounds ( N ), 
calculating the average flavor sharing of a recipe (A5f) 
and that of a cuisine (average N s ) has also been well- 
established in these studies. For our current study we 
have made use of these calculation methodologies only 
in order to test our model’s capability of flavor pairing 
effect regeneration. 

While studying and regenerating the frequency-rank 
distribution of ingredients itself is statistically interest¬ 
ing enough, can such a model reproduce empirically ob¬ 
served flavor sharing patterns as well? To answer this 
question, we began with comparing the average N s val¬ 
ues [2j [3J of both the copy-mutate cuisines with that of 
the Indian cuisine (Fig. [3]). As shown, the model cuisine 
with occurrence based fitness of ingredients has a closer 
average N s value to that of the Indian cuisine, while the 
random fitness based model cuisine’s average N s is much 
higher indicating that certain fitness domain can produce 
a better model in terms of overall flavor effect observed. 
This further established that certain highly used ingredi¬ 
ents play a vital role in defining the characteristic of the 
cuisine. 

The model was applied for all the eight regional 
cuisines so as to check its applicability across cuisines. 
Interestingly, for all regional cuisines, barring Jain and 
Rajasthani , the copy-mutate model with ranked fitness 
values of ingredients produced better results for average 
N s . This is shown in Fig. [2] 

However, the average N s over entire cuisine is not nec¬ 
essarily a strong reflector of the underlying flavor pattern. 
So we look a level deeper and check the recipe level dis¬ 
tribution of the NR values. As indicated by Fig. [HJ the 
distribution of values over the cuisine (average of 24 
sets in case of copy-mutate model) also gets closer to em¬ 
pirical distribution as we move from random fitness to oc¬ 
currence based fitness domain. The model with random 
fitness, as expected, shows the pattern closer to that of 
the uniform random model of the cuisine (one in which 
recipe-size distribution was preserved but recipes com¬ 
posed of uniformly selected ingredients). This further 
enhances our observation that the model with specific 
fitness domain is capable of producing a cuisine compa¬ 
rable with Indian cuisine in terms of flavor profile as well. 

As another alternative, we tried to create a model 
which could recreate the recipe-size distribution of a cui¬ 
sine with inclusion of probabilistic addition and deletion 
of ingredients from recipes instead of only replacement 
of ingredients (mutation). Though exact replication of 
recipe-size distribution could not be achieved, as Fig. [6] 
indicates, certain probability values of addition, deletion 
and mutation get the recipe-size distribution similar to 
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FIG. 2. Frequency-rank distributions for all eight regional cuisines and corresponding models. The distributions for models 
closely match that of the empirical data in each one consistently. 



FIG. 3. Average N s values of Indian cuisine and the two copy-mutate models (‘Fitness random’ and ‘Fitness ranked’) and 
their 2 -scores. The model with ranked values of fitness produces closer average N a value compared to that of the random one. 
Corresponding statistical significance is shown by the 2 -score. 






5 


TABLE II. Top 5 most authentic ingredients for each of the regional cuisine. 


Bengali 

Gujarati 

Jain 

Maharashtrian 

Mughlai 

Punjabi 

Rajasthani 

South Indian 

coriander 
egg plant 
turmeric 
milk 

ginger garlic paste 

asafoetida 
green bell pepper 
sesame seed 
black mustard seed oil 
chickpea 

butter 
corn grit 
banana 

tomato 

corn 

turmeric 

coconut 

cayenne 

cinnamon 

clove 

milk 

cardamom 

ghee 

cream 

clove 

garam masala 
wheat 

sunflower oil 
cottage cheese 
onion 

ghee 

fennel 

cayenne 

chickpea 

cumin 

curry leaf 
black bean 

black mustard seed oil 
rice 

tamarind 




FIG. 4. Average N s values of eight Indian regional cuisines, their copy-mutate models (‘Fitness random’ and ‘Fitness ranked’) 
and their ^-scores. 


TABLE III. Values of fitting coefficient ‘6’ for all eight regional 
Indian cuisines and corresponding models. 


Cuisine 


RC a 


CM-FRanc0 


CM-FRanl c 


Bengali 

0.43U2 

0.4842 

0.460F 

Gujarati 

0.5204 

0.507 

0.5241 

Jain 

0.4784 

0.4947 

0.4758 

Maharashtrian 

0.463 

0.4709 

0.4601 

Mughlai 

0.5347 

0.4794 

0.4794 

Punjabi 

0.5702 

0.4783 

0.5075 

Rajasthani 

0.5761 

0.505 

0.5382 

South Indian 

0.5696 

0.5321 

0.4997 


a Regional cuisine 
b Copy-mutate fitness random 
c Copy-mutate fitness ranked 


that of the real cuisine. 

Interestingly, mutation seemed to have been the dom¬ 
inating factor in evolution of the Indian cuisine as only 
those trials of the CMAD model gave better results for 
the recipe-size distribution which had higher probability 
value for mutation to occur compared to addition or dele¬ 
tion. If historical data were available, this observation 
could prove useful in generating the phylogenetic tree of 
recipes. 


VI. CONCLUSIONS 

Quantitative as well as data-centric analysis of world 
cuisines has caught attention of physicists recently. We 
had earlier shown that the Indian cuisine is unique in its 
strong negative food pairing pattern [3]. In this study 
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(a) Bengali (b) Gujarati 



(c) Jain 



(d) Maharashtrian 



n r 

s 


(e) Mughlai (f) Punjabi 



(g) Rajasthani 



(h) South Indian 



FIG. 5. Cumulative distribution of P(N^) vs N^ values of eight regional cuisines of India and their copy-mutate models. 


we focused on models [T| of eight Indian regional cuisines 
to probe for mechanisms that might have been dominant 
in their evolution over centuries. Our models highlight 
the role of ‘ingredient frequency’ in rendering the char¬ 
acteristic food pairing pattern of Indian recipes. Fur¬ 
ther, we looked for the processes that are central to the 
recipes-size distribution and observed that phenomenon 
of mutation (change of one ingredient with another) very 
well explains the observed pattern. Our models and cor¬ 
responding studies highlight the possibility of having an 
algorithmic way to suggest novel ingredient combinations 
as recipes m while still maintaining the flavor signature 


of a cuisine. 
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FIG. 6. Recipe size distributions for the Indian cuisine and trials of CMAD model with varying probabilities of addition, 
deletion and mutation of recipes. 
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